In this lab, we will explore some of R’s functionality with spatial
data, with special attention given to the sf package.
For more information about sf, you can visit their website. Robin Lovelace’s
book Geocomputation with R (available online) is also a really
helpful and educational source for learning sf. For
some of the longer examples, it is highly recommended to use R’s
scripting functions to run the examples to save on re-typing.
Next download the following files Google drive and move them to your
data folder:
Make sure to unzip the zip files so that R can access the content. Note that on Windows, you will need to right-click on the file and select ‘Extract files’.
Base R has no structure for spatial data, so you will need to install the following packages (you should have some of these from previous modules):
library(ggplot2)
library(terra)
library(RColorBrewer)
library(sf)
library(viridis)
sfsf?sf is an R package designed to work with spatial data
organized as “simple features”, an ISO standard for spatial data. The
sf package is able to provide all the functionality it does
because it interfaces with three widely adopted programming standards:
PROJ, GDAL, and GEOS. These provide for coordinate reference systems,
reading and writing of spatial data, and geometric operations,
respectively, but more on this in a moment.
Note that all sf functions are prefixed with
st_ (a legacy of this R package’s origins in PostGIS, where
‘st’ means “spatial type”). If this is not already installed on your
computer, you’ll need to install it before going any further:
install.packages("sf")
A simple feature is, in the words of the sf authors, “a
formal standard (ISO 19125-1:2004) that describes how objects in the
real world can be represented in computers, with emphasis on the spatial
geometry of these objects” (ref). In
other words, its structured data that provides information about a
location in space, including its shape.
One great advantage of sf is that the data is stored in
R’s memory as an extended dataframe. This means that all the functions
that you can apply to a dataframe should work with an sf
object. To demonstarte this, we’ll load a file of county outlines for
North Carolina (this file is included when you install the
sf package). First load the sf library:
library(sf)
Now load the shapefile using st_read(). This is a
generic function (more on this below) for loading spatial data, and will
work with most vector formats.
path_to_data <- system.file("shape/nc.shp", package="sf")
north_carolina <- st_read(path_to_data, quiet = TRUE)
north_carolina <- north_carolina[ , c("CNTY_ID", "NAME", "AREA", "PERIMETER")]
north_carolina
## Simple feature collection with 100 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS: NAD27
## First 10 features:
## CNTY_ID NAME AREA PERIMETER geometry
## 1 1825 Ashe 0.114 1.442 MULTIPOLYGON (((-81.47276 3...
## 2 1827 Alleghany 0.061 1.231 MULTIPOLYGON (((-81.23989 3...
## 3 1828 Surry 0.143 1.630 MULTIPOLYGON (((-80.45634 3...
## 4 1831 Currituck 0.070 2.968 MULTIPOLYGON (((-76.00897 3...
## 5 1832 Northampton 0.153 2.206 MULTIPOLYGON (((-77.21767 3...
## 6 1833 Hertford 0.097 1.670 MULTIPOLYGON (((-76.74506 3...
## 7 1834 Camden 0.062 1.547 MULTIPOLYGON (((-76.00897 3...
## 8 1835 Gates 0.091 1.284 MULTIPOLYGON (((-76.56251 3...
## 9 1836 Warren 0.118 1.421 MULTIPOLYGON (((-78.30876 3...
## 10 1837 Stokes 0.124 1.428 MULTIPOLYGON (((-80.02567 3...
Note that there are a few lines of metadata, and then each row represents one spatial feature, along with a set of associated information. You can summarize this somewhat verbose printout by noting that simple features fit a simple formula:
\[ sf = attributes + geometry + crs
\]
This formula also suggests the kinds of ways that you might interact
with an sf object by, for example, changing its crs, or
filtering based on its attributes (or geometry), or manipulating its
geometry.
Attributes are properties of a feature. In this case, the
features are counties in North Carolina, and their attributes are things
like name and area. In an sf data.frame, each
feature is a row, and each attribute is a column. In the
north_carolina object, for example, the first feature has
the name “Ashe” and its county ID is 1825.
A very special attribute column is called the geometry
(sometimes labeled ‘geom’ or ‘shape’), shown above in the last column.
It consists of a point or set of points (specifically, their
coordinates) that define the shape and location of the feature. The
simple feature standard includes 17 geometry types, 7 of which are
supported by sf: point, multipoint, linestring,
multilinestring, polygon, multipolygon, and geometry collection.
As mentioned already, these geometries are just a series of points:
point_one <- st_point(c(0, 3))
point_two <- st_point(c(5, 7))
a_line <- st_linestring(c(point_one, point_two))
If you print these geometries
point_one
## POINT (0 3)
a_line
## LINESTRING (0 3, 5 7)
you see that they are represented as a text string. This is the Well Known Text (WKT) standard for specifying geometries. It tells us what kind of geometry the feature is and lists its x-y coordinates separated by commas.
If you want to know what geometry type your simple feature contains, try:
st_geometry_type(a_line)
## [1] LINESTRING
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
To show the type of the first feature in the North Carolina shapefile:
st_geometry_type(north_carolina[1, ])
## [1] MULTIPOLYGON
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
The final ingredient in a simple feature is its a spatial or coordinate reference system (CRS). A CRS provides two crucial pieces of information: (i) what rule we use to assign coordinates to points and (ii) what datum to use. It is not an exaggeration to say that the CRS is the most important element of a simple feature, for without a CRS, the numbers in the geometry column are just that, numbers, rather than full-blooded spatial coordinates.
Understanding what a coordinate assignment rule does is beyond the scope of this lab, but the datum deserves some attention. In effect, it specifies three things:
POINT (0 0),POINT (5 7) as being 5 meters east and seven
meters north of the origin, or - worse - 5 feet east
and 7 feet north, andAs with the geometries, the standard for representing CRS is WKT, though the easiest way to identify a CRS is to use its EPSG code. To find the EPSG code for a CRS, you can visit this website: spatialreference.org.
The most widely used CRS is the World Geodetic System 84 (WGS 84, a geographic system) whose EPSG code is 4326:
st_crs(4326)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
If you are familiar with the PROJ4-string syntax, you can retrieve that from a CRS with:
st_crs(4326)$proj4string
## [1] "+proj=longlat +datum=WGS84 +no_defs"
However, current open standards specified by PROJ and GDAL discourage the use of PROJ4-string syntax in favor of WKT, so it is probably best to get use to the latter now.
There’s actually one more element to a simple feature, but it is not as vital as the others and is really already implicit in the geometry. That is the bounding box. This is an object defined by the spatial extent of the data: the minimum and maximum x and y coordinates. You can retrieve the bounding box of a simple feature this way:
st_bbox(north_carolina)
## xmin ymin xmax ymax
## -84.32385 33.88199 -75.45698 36.58965
There are myriad uses for the bounding box, though we need not dwell on them here.
Reading and writing spatial data, it turns out, is quite the chore.
The solution sf relies on is to interface with GDAL, which
handles lots of different spatial data types (it’s kinda its whole
purpose). Currently supported (vector) spatial data types can be found
at GDAL.org.
Perhaps the most common spatial data type - because ESRI is a thing - is
the shapefile, which has a .shp file extension.
In sf, the function for reading in spatial data is
st_read. Here is the nitty-gritty and, perhaps, needlessly
verbose version first:
#./data/utahcounty/utahcounty.shp
utah <- st_read(dsn = "./data/utahcounty/utahcounty.shp",
layer = "utahcounty",
drivers = "ESRI Shapefile")
## options: ESRI Shapefile
## Reading layer `utahcounty' from data source
## `/Users/u0784726/Dropbox/Data/devtools/ugic-intro-r/data/utahcounty/utahcounty.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 29 features and 13 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -114.0529 ymin: 36.99766 xmax: -109.0416 ymax: 42.00171
## CRS: NA
dsn stands for “data source name” and specifies where
the data is coming from, whether a file directory, a database, or
something else. layer is the layer in the data source to be
read in. Finally, drivers tells GDAL what format the file
is in or what structure it has, so it knows how to correctly interpret
the file. All of this information is printed to the console when you
execute st_read.
In this case, we are using a simple ESRI shapefile, so the data
source and layer are basically the same thing. Furthermore,
sf is good at guessing the driver based on the file
extension, so the driver does not normally need to be specified. Hence,
we could just as well have written:
utah <- st_read("./data/utahcounty/utahcounty.shp")
## Reading layer `utahcounty' from data source
## `/Users/u0784726/Dropbox/Data/devtools/ugic-intro-r/data/utahcounty/utahcounty.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 29 features and 13 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -114.0529 ymin: 36.99766 xmax: -109.0416 ymax: 42.00171
## CRS: NA
And here’s what this looks like. The combination of
st_geometry and plot extracts only the spatial
geometry (the polygons) and plots those. We’ll look more closely at
plotting these and making maps later.
plot(st_geometry(utah))
Sometimes you have spatial data, but it is not in a spatial data format. Usually, this means you have a table or spreadsheet with columns for the x and y coordinates.
wna_climate <- read.csv("./data/WNAclimate.csv")
head(wna_climate)
## WNASEQ LONDD LATDD ELEV totsnopt annp Jan_Tmp Jul_Tmp
## 1 1 -107.9333 50.8736 686 78.7869 326 -13.9 18.8
## 2 2 -105.2670 55.2670 369 145.3526 499 -21.3 16.8
## 3 3 -102.5086 41.7214 1163 42.6544 450 -4.2 23.3
## 4 4 -110.2606 44.2986 2362 255.1009 489 -10.9 14.1
## 5 5 -114.1500 59.2500 880 164.8924 412 -23.9 14.4
## 6 6 -120.6667 57.4500 900 141.9260 451 -17.5 13.8
This can be converted to a simple feature using the
st_as_sf function like so:
wna_climate <- st_as_sf(wna_climate,
coords = c("LONDD", "LATDD"),
crs = 4326)
wna_climate
## Simple feature collection with 2012 features and 6 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -138 ymin: 29.8333 xmax: -100 ymax: 60
## Geodetic CRS: WGS 84
## First 10 features:
## WNASEQ ELEV totsnopt annp Jan_Tmp Jul_Tmp geometry
## 1 1 686 78.7869 326 -13.9 18.8 POINT (-107.9333 50.8736)
## 2 2 369 145.3526 499 -21.3 16.8 POINT (-105.267 55.267)
## 3 3 1163 42.6544 450 -4.2 23.3 POINT (-102.5086 41.7214)
## 4 4 2362 255.1009 489 -10.9 14.1 POINT (-110.2606 44.2986)
## 5 5 880 164.8924 412 -23.9 14.4 POINT (-114.15 59.25)
## 6 6 900 141.9260 451 -17.5 13.8 POINT (-120.6667 57.45)
## 7 7 1100 183.4187 492 -18.8 13.2 POINT (-119.7167 56.7167)
## 8 8 1480 191.5657 606 -11.0 12.8 POINT (-114.6 50.7667)
## 9 9 651 144.9015 443 -22.4 15.3 POINT (-122.1667 59.5167)
## 10 10 725 151.0778 455 -23.5 15.4 POINT (-112.1 57.7667)
The function just needs to know what columns the x and y coordinates are in and what CRS they are specified in. And here’s what it looks like:
plot(st_geometry(wna_climate), pch = 19, col = alpha("darkgreen", 0.5))
The sf function for writing simple features to disk is
st_write. It is almost an exact mirror of
st_read, but it also requires that you specify the simple
feature object in your R environment that you want to write to disk. If
the layer already exists, you will need to specify
delete_layer = TRUE.
st_write(obj = wna_climate,
dsn = "./wnaclim.shp",
layer = "wnaclim",
drivers = "ESRI Shapefile")
or, more simply:
st_write(wna_climate, dsn = "./data/wnaclim.shp")
The cardinal rule for working with any spatial data is to make sure all of it is in the same CRS. This ensures that any analysis which combines multiple sources is correctly comparing values at the same locations. Never ever ever ever do anything with your data until you are sure you’ve got the CRS right.
The st_crs() function allows you to quickly check the
CRS for any object.
st_crs(utah)
## Coordinate Reference System: NA
Which shows as missing (NA), as there was no prj file
with this shapefile. We can set this using an EPSG code. The shapefile
is in WGS84, and the EPSG code for this is 4326. There are two methods
to set the CRS for a spatial object: st_crs<- and
st_set_crs.
utah <- st_set_crs(utah, 4326)
st_crs(utah) <- 4326
If we now recheck the CRS, you’ll see that it is now fully informed in WKT format:
st_crs(utah)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
Note: this should only be used when the simple feature is missing a CRS and you know what it is. It is NOT for re-projecting the sf object to a new coordinate system.
The st_transform() function allows you to project your
sf object to a new CRS. This is particularly useful if you have multiple
data sources with different original coordinate systems. Here, we’ll
reproject the Utah country data to UTM Zone 12N (EPSG code 32612):
utah <- st_transform(utah, crs = 32612)
st_crs(utah)
## Coordinate Reference System:
## User input: EPSG:32612
## wkt:
## PROJCRS["WGS 84 / UTM zone 12N",
## BASEGEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]],
## CONVERSION["UTM zone 12N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-111,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["Engineering survey, topographic mapping."],
## AREA["Between 114°W and 108°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - Alberta; Northwest Territories (NWT); Nunavut; Saskatchewan. Mexico. United States (USA)."],
## BBOX[0,-114,84,-108]],
## ID["EPSG",32612]]
As a reminder: when you read in spatial data, the first thing you
should use is st_crs to check the CRS and
st_transform to re-project if necessary.
You can also check the EPSG code (if specified):
st_crs(utah)$epsg
## [1] 32612
st_crs(wna_climate)$epsg
## [1] 4326
And you can get the name of a CRS this way:
format(st_crs(utah))
## [1] "WGS 84 / UTM zone 12N"
The attribute part of a sf object is a data.frame, so
you can use all the methods we have previously looked at for data
manipulation in working with attributes.
class(utah)
## [1] "sf" "data.frame"
If you enter the name of an sf object, it will print the
first few rows of the attribute table:
utah
## Simple feature collection with 29 features and 13 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 228583.4 ymin: 4094744 xmax: 673944.6 ymax: 4653572
## Projected CRS: WGS 84 / UTM zone 12N
## First 10 features:
## SP_ID COUNTYNBR ENTITYNBR ENTITYYR NAME FIPS STATEPLANE POP_LASTCE
## 1 0 13 2010131010 2010 KANE 25 South 7125
## 2 1 14 2010141010 2010 MILLARD 27 Central 12503
## 3 2 20 2010201010 2010 SANPETE 39 Central 27822
## 4 3 04 2010041010 2010 CARBON 7 Central 21403
## 5 4 25 2010251010 2010 UTAH 49 Central 516564
## 6 5 03 2010031010 2010 CACHE 5 North 112656
## 7 6 22 2010221010 2010 SUMMIT 43 North 36324
## 8 7 27 2010271010 2010 WASHINGTON 53 South 138115
## 9 8 10 2010101010 2010 GRAND 19 Central 9225
## 10 9 24 2010241010 2010 UINTAH 47 Central 32588
## POP_CURRES FIPS_STR COLOR4 Shape_Leng Shape_Area
## 1 7125 49025 4 559368.8 10631563850
## 2 12503 49027 3 576258.3 17708674898
## 3 27822 49039 2 317646.9 4146734071
## 4 21403 49007 1 333190.1 3844121594
## 5 516564 49049 3 481098.2 5544905821
## 6 112656 49005 2 290082.5 3035354927
## 7 36324 49043 1 477597.9 4870022623
## 8 138115 49053 1 347253.9 6297924699
## 9 9225 49019 1 516092.9 9539294359
## 10 32588 49047 2 540619.1 11661945600
## geometry
## 1 POLYGON ((425313.9 4154648,...
## 2 POLYGON ((393592.1 4378949,...
## 3 POLYGON ((448347.6 4407164,...
## 4 POLYGON ((495617.5 4407122,...
## 5 POLYGON ((449722.4 4491979,...
## 6 POLYGON ((414165.9 4623363,...
## 7 POLYGON ((447597 4510365, 4...
## 8 POLYGON ((281567.5 4165110,...
## 9 POLYGON ((667230.1 4373926,...
## 10 POLYGON ((651528.9 4524595,...
As this is a dataframe at heart, we can use the same functions we looked at in the previous lab to manipulate the data:
# method 1 (base R)
utah2 <- utah[ , c("NAME", "FIPS", "POP_CURRES")]
# method 2 (dplyr)
library(dplyr)
utah2 <- utah %>%
select(NAME, FIPS, POP_CURRES)
names(utah)
## [1] "SP_ID" "COUNTYNBR" "ENTITYNBR" "ENTITYYR" "NAME"
## [6] "FIPS" "STATEPLANE" "POP_LASTCE" "POP_CURRES" "FIPS_STR"
## [11] "COLOR4" "Shape_Leng" "Shape_Area" "geometry"
names(utah2)
## [1] "NAME" "FIPS" "POP_CURRES" "geometry"
Notice this very important difference between regular data.frames and
sf data.frames: when you subset by columns, even though you
do not explicitly state that you want to keep the geometry column, it
keeps that column anyway. In this sense, the geometry column is said to
be “sticky”.
Subsetting the data by rows works in the same way as before. So we
can carry out conditional selection of locations by using the usual
comparison operators (<, <=, ==, !=, >=, >).
For example, to select only the counties with over 50,000 people:
# method 1
utah3 <- utah[utah$POP_CURRES > 50000, ]
# method 2
utah3 <- utah2 %>%
filter(POP_CURRES > 50000)
plot(st_geometry(utah), col = alpha("gray", 0.2), pch = 19)
plot(st_geometry(utah3), col = "darkred", pch = 19, add = TRUE)
New variables can easily be appended to an existing sf
object using the following notation:
utah$area_km2 <- utah$Shape_Area / 1e6
# method 2
utah <- utah %>%
mutate(area_km2 = Shape_Area / 1e6)
names(utah)
## [1] "SP_ID" "COUNTYNBR" "ENTITYNBR" "ENTITYYR" "NAME"
## [6] "FIPS" "STATEPLANE" "POP_LASTCE" "POP_CURRES" "FIPS_STR"
## [11] "COLOR4" "Shape_Leng" "Shape_Area" "geometry" "area_km2"
If you need to extract any variable from a sf object to
a standard R vector, you can again use the standard notation. Note that
if you use select to specify the columns, you need to also
add st_drop_geometry to remove the geometry:
# method 1
area_km2 <- utah$area_km2
# method 2
area_km2 <- utah %>%
select(area_km2) %>%
st_drop_geometry()
area_km2$area_km2[1:10]
## [1] 10631.564 17708.675 4146.734 3844.122 5544.906 3035.355 4870.023
## [8] 6297.925 9539.294 11661.946
If you need only the geometry (the set of coordinates, or vector definitions), these can be extracted as follows:
geometry <- st_geometry(utah)
geometry
## Geometry set for 29 features
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 228583.4 ymin: 4094744 xmax: 673944.6 ymax: 4653572
## Projected CRS: WGS 84 / UTM zone 12N
## First 5 geometries:
## POLYGON ((425313.9 4154648, 425715.6 4154644, 4...
## POLYGON ((393592.1 4378949, 393813.7 4378945, 3...
## POLYGON ((448347.6 4407164, 448363.5 4407163, 4...
## POLYGON ((495617.5 4407122, 496010.1 4407117, 4...
## POLYGON ((449722.4 4491979, 449736.6 4491978, 4...
In case you just want the attributes, not the geometry:
attributes <- st_drop_geometry(utah)
head(attributes)
## SP_ID COUNTYNBR ENTITYNBR ENTITYYR NAME FIPS STATEPLANE POP_LASTCE
## 1 0 13 2010131010 2010 KANE 25 South 7125
## 2 1 14 2010141010 2010 MILLARD 27 Central 12503
## 3 2 20 2010201010 2010 SANPETE 39 Central 27822
## 4 3 04 2010041010 2010 CARBON 7 Central 21403
## 5 4 25 2010251010 2010 UTAH 49 Central 516564
## 6 5 03 2010031010 2010 CACHE 5 North 112656
## POP_CURRES FIPS_STR COLOR4 Shape_Leng Shape_Area area_km2
## 1 7125 49025 4 559368.8 10631563850 10631.564
## 2 12503 49027 3 576258.3 17708674898 17708.675
## 3 27822 49039 2 317646.9 4146734071 4146.734
## 4 21403 49007 1 333190.1 3844121594 3844.122
## 5 516564 49049 3 481098.2 5544905821 5544.906
## 6 112656 49005 2 290082.5 3035354927 3035.355
Note: this is actually a special sort of data.frame
called a tibble. Not important to know about here, but does
print slightly differently.
Spatial operations are like attribute operations, but they work with the geometry column rather than the attributes. There are loads of these functions, but will just review some of the more important ones here.
This is probably the biggest one. Basically, you are taking one
geometry and using it to filter other geometries. To demonstrate this,
first we’ll make some random points in the utah simple
feature, using st_sample to generate the random points:
set.seed(1234)
random_pnts <- st_sample(utah, size = 500)
random_pnts <- st_as_sf(random_pnts)
plot(st_geometry(utah))
plot(st_geometry(random_pnts),
col = alpha("red", 0.5),
pch = 19,
add = TRUE)
Now, we can use one geometry to filter out a second one. To obtain
just the points in, say, Salt Lake County, we first subset the Utah
sf object to extract only this county:
slcounty <- subset(utah, NAME == "SALT LAKE")
Then you can filter the points using this new object, either by using
the st_filter() function, or using the country as an index
in the [,] notation:
filtered_pnts <- st_filter(random_pnts, slcounty)
filtered_pnts <- random_pnts[slcounty, ]
plot(st_geometry(utah))
plot(st_geometry(filtered_pnts),
col = "red",
pch = 19,
add = TRUE)
Internally, st_filter assumes a “topological” or spatial
relationship defined by what the sf authors refer to as
spatial predicate (.predicate). By default,
st_intersects works to find the geometry of one object
located within another. We can, however, specify other spatial
relationships. For example, to get all the points outside Salt
Lake:
filtered_pnts <- st_filter(random_pnts, slcounty, .predicate = st_disjoint)
Another useful predicate is st_is_within_distance, which
requires that you pass an additional distance (dist)
argument to the filter. The dist argument is in units
specified by the CRS, in this case meters.
filtered_pnts <- st_filter(random_pnts,
slcounty,
.predicate = st_is_within_distance,
dist = 50000)
With spatial operations, the geometry is preserved (mostly). With geometric operations, the whole point is to manipulate the geometry. Again, we are just going to hit the highlights. It is worth emphasizing that these operations will often behave differently depending on the geometry type.
the_heart_of_slc <- st_centroid(slcounty)
## Warning: st_centroid assumes attributes are constant over geometries
plot(st_geometry(slcounty))
plot(st_geometry(the_heart_of_slc), pch = 17,
col = "red", cex = 2, add = TRUE)
the_heft_of_slc <- st_buffer(slcounty, dist = 50000)
This one merges geometries and dissolves interior borders when applied to polygons.
utah_boundary <- st_union(utah)
plot(st_geometry(utah_boundary))
To cast a geometry is to change it from one geometry type to another. For example, to convert the boundary of Salt Lake County to points (the vertices of the polygon):
slc_points <- st_cast(slcounty, "POINT")
## Warning in st_cast.sf(slcounty, "POINT"): repeating attributes for all
## sub-geometries for which they may not be constant
plot(st_geometry(slc_points), col = "darkorange", pch = 19)
If we convert to a LINESTRING object, this acts to
separate out the individual polygons:
utah_lines <- st_cast(utah_boundary, "MULTILINESTRING")
utah_lines <- st_cast(utah_lines, "LINESTRING")
graphicsTo make simple plots of an sf object, you can use R’s
base function plot():
plot(utah2)
Notice that it creates separate plots for each attribute. If you would prefer to plot the geometry itself, as we did above.
plot(st_geometry(utah2))
ggplot2One of the easiest ways to improve on these base plots is to use
ggplot2. This contains a a special plotting geometry,
geom_sf, designed to work with sf objects.
(Note that geom_sf refers to a ggplot2
geometry, not a sf geometry. Confusing, right?)
We can now plot this by calling the ggplot() function
and adding the sf object with geom_sf:
ggplot() +
geom_sf(data = slcounty) +
theme_bw()
Note that even though the map is projected in UTM coordinates, the axes are set to latitude and longitude.
Multiple layers can be added to a plot by adding additional
geom_sf functions. Here, we read in two additional
shapefiles: one containing the locations of light rail stations, and one
containing the light rail tracks.
lightrail <- st_read("./data/LightRail_UTA/LightRail_UTA.shp")
## Reading layer `LightRail_UTA' from data source
## `/Users/u0784726/Dropbox/Data/devtools/ugic-intro-r/data/LightRail_UTA/LightRail_UTA.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 38 features and 5 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: 413241.3 ymin: 4486347 xmax: 429455.1 ymax: 4515254
## Projected CRS: NAD83 / UTM zone 12N
stations <- st_read("./data/LightRailStations_UTA/LightRailStations_UTA.shp")
## Reading layer `LightRailStations_UTA' from data source
## `/Users/u0784726/Dropbox/Data/devtools/ugic-intro-r/data/LightRailStations_UTA/LightRailStations_UTA.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 57 features and 6 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 413245.8 ymin: 4486448 xmax: 429389.5 ymax: 4515197
## Projected CRS: NAD83 / UTM zone 12N
ggplot() +
geom_sf(data = slcounty) +
geom_sf(data = lightrail, col = 'blue') +
geom_sf(data = stations, col = 'darkorange', alpha = 0.6) +
theme_bw()
stadium = stations %>%
filter(STATIONNAM == "Stadium")
stadium = cbind(stadium, st_coordinates(stadium))
ggplot() +
geom_sf(data = slcounty) +
geom_sf(data = lightrail, col = 'blue') +
geom_sf(data = stations, col = 'darkorange', alpha = 0.6) +
geom_label(data = stadium, aes(X, Y, label = STATIONNAM), size = 2.5) +
theme_bw()
We can create thematic maps by specifying the name of a variable in
the geom_sf() function:
names(utah2)
## [1] "NAME" "FIPS" "POP_CURRES" "geometry"
ggplot() +
geom_sf(data = utah2, aes(fill = POP_CURRES)) +
theme_bw()
my_breaks = c(0, 10000, 100000, 1000000)
ggplot() +
geom_sf(data = utah2, aes(fill = POP_CURRES)) +
scale_fill_continuous(trans = "log",
breaks = my_breaks, labels = my_breaks) +
theme_bw()
Here, we will use the viridis color scale, which is
colorblind safe. This comes with several color palette
options.
ggplot() +
geom_sf(data = utah2, aes(fill = POP_CURRES)) +
scale_fill_viridis(option = "viridis") +
theme_bw()
ggplot() +
geom_sf(data = utah2, aes(fill = POP_CURRES)) +
scale_fill_viridis(option = "magma", trans = "log",
breaks = my_breaks, labels = my_breaks) +
theme_bw()